搜索 - 腾讯云开发者社区-腾讯云

文章/答案/技术大牛

发布

来自专栏机器之心
关于LLM-as-a-judge范式，终于有综述讲明白了
大型语言模型 (LLM) 的最新进展启发了 “LLM-as-a-judge” 范式，其中 LLM 被用于在各种任务和应用程序中执行评分、排名或选择。然后，我们介绍一个全面的分类法，从三个维度探索 LLM-as-a-judge：评判什么（what to judge）、如何评判（how to judge）以及在哪里评判（where to judge）。 llm-as-a-judge/Awesome-LLM-as-a-judge 文章结构图 1：论文结构 LLM-as-a-judge 的定义图 2：LLM-as-a-judge 定义在这篇工作中，我们提出根据输入和输出格式的区别对 LLM-as-a-judge 进行了定义。图 4：LLM-as-a-judge prompting 方法（2）提示：提示（prompting）技术可以有效提升 LLM-as-a-judge 的性能和效率。
1.7K10编辑于 2025-02-14
来自专栏CQ品势
Online Judge
北京大学 Online Judge（POJ) <http://acm.pku.edu.cn/JudgeOnline/> 建立较晚，但题目加得很快，现在题数和ZOJ不相上下，特点是举行在线比赛比较多，这个题库的一大特点就是 Online Judge功能强大，其实pku现在已经是中国最好的ACM网站。浙江大学 Online Judge（ZOJ） <http://acm.zju.edu.cn> 国内最早也是最有名气的OJ，有很多高手在上面做题。打开速度快。西班牙Valladolid大学 Online Judge（UVA） <http://acm.uva.es/> 世界上最大最有名的OJ，题目巨多而且巨杂，数据也很刁钻，全世界的顶尖高手都在上面。俄罗斯Ural立大学 Online Judge（URAL） <http://acm.timus.ru/> 也是一个老牌的OJ，题目不多，但题题经典，我在高中的时候就在这上面做题的。
1.2K30编辑于 2021-12-07
来自专栏calmound
UVA Hangman Judge
In ``Hangman Judge,'' you are to write a program that judges a series of Hangman games. Your task as the ``Hangman Judge'' is to determine, for each game, whether the contestant wins, loses
83570发布于 2018-04-11
来自专栏Vincent-yuan
刽子手游戏（Hangman Judge）
在本题中，你的任务时编写一个裁判程序，输入单词和玩家的猜测，判断玩家赢了（You win）输了（You lose.）
84130发布于 2020-06-15
来自专栏ml
hduoj1073--Online Judge
Online Judge Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Others) Total , now he has worked out all the problems except the Judge System. If the two files are absolutly same, then the Judge System return "Accepted", else if the only differences between the two files are spaces(' '), tabs('\t'), or enters('\n'), the Judge System should return " Output For each test cases, you should output the the result Judge System should return.
1.2K70发布于 2018-03-21
来自专栏技术集锦
Online Judge System 中术语含义
大家在刷题时，常见的 OJ 平台（例如：洛谷）都会遇到测试点提示的信息，为了清楚的知道自己错在哪里，非常有必要了解下 OJ 系统的提示信息术语解释缩略语英文全称中文全称 OJ Online Judge
81010编辑于 2022-06-03
来自专栏chenjx85的技术专栏
leetcode-657-Judge Route Circle
Given a sequence of its moves, judge if this robot makes a circle, which means it moves back to the original
59540发布于 2018-05-22
来自专栏SnailTyan
Find the Town Judge
if N == 1: return 1 if len(trust) < N - 1: return -1 judge {} for pair in trust: people[pair[0]] = people.get(pair[0], 0) + 1 judge [pair[1]] = judge.get(pair[1], 0) + 1 for key, value in judge.items(): if value 1: return i return -1 Reference https://leetcode.com/problems/find-the-town-judge
29010发布于 2021-02-05
来自专栏小樱的经验随笔
Open Judge 2750 鸡兔同笼
2750:鸡兔同笼总时间限制:1000ms 内存限制:65536kB描述一个笼子里面关了鸡和兔子（鸡有2只脚，兔子有4只脚，没有例外）。已经知道了笼子里面脚的总数a，问笼子里面至少有多少只动物，至多有多少只动物。输入一行，一个正整数a (a < 32768)。输出一行，包含两个正整数，第一个是最少的动物数，第二个是最多的动物数，两个正整数用一个空格分开。如果没有满足要求的答案，则输出两个0，中间用一个空格分开。样例输入20样例输出5 10题目链接：http://bailian.openjudge.c
67860发布于 2018-04-08
来自专栏月亮与二进制
Judge Route Circle
Given a sequence of its moves, judge if this robot makes a circle, which means it moves back to the original
42910发布于 2021-11-23
来自专栏AI那点小事
10-排序5 PAT Judge (25分)
The ranklist of PAT is generated from the status list, which shows the scores of the submissions. This time you are supposed to generate the ranklist for PAT.
52210发布于 2020-04-18
来自专栏自然语言处理
Judge Route Circle
Judge Route Circle 描述：题目很简单，就是判断路线能否组成一圈。向“右”走的步数一定要等于向“左”走的步数；向“下”走的步数一定要等于向“上”走的步数。 ?
68440发布于 2018-04-11
来自专栏程序编程之旅
HDOJHDU 1073 Online Judge(字符串处理~)
Problem Description Ignatius is building an Online Judge, now he has worked out all the problems except the Judge System. If the two files are absolutly same, then the Judge System return “Accepted”, else if the only differences between the two files are spaces(’ ‘), tabs(‘\t’), or enters(‘\n’), the Judge System should return “ Output For each test cases, you should output the the result Judge System should return.
40320发布于 2021-01-21
来自专栏大白技术控的技术自留地
九度Online Judge 题目1432：叠筐解答
九度Online Judge 题目1432：叠筐解答提交网址： http://ac.jobdu.com/problem.php?
81430发布于 2019-03-05
使用Nova LLM评估生成式AI模型
为了弥补这一差距，LLM-as-a-judge已成为一种有前景的方法，它利用LLM的推理能力来更灵活、大规模地评估其他模型。 Nova LLM-as-a-Judge训练方法Nova LLM-as-a-Judge通过多步骤训练过程构建，包括监督训练和使用人工偏好标注的公共数据集的强化学习阶段。理解Nova LLM-as-a-Judge工作原理某中心Nova LLM-as-a-Judge使用称为二元总体偏好评判的评估方法。接下来，PyTorch Estimator使用某中心Nova LLM-as-a-Judge配方启动评估作业。启动评估作业准备完数据集并创建评估配方后，最后一步是启动执行某中心Nova LLM-as-a-Judge评估的SageMaker训练作业。
40010编辑于 2025-09-09
来自专栏武培轩的专栏
Judge Route Circle（判断路线成圈）
Judge Route Circle（判断路线成圈） * 初始位置 (0, 0) 处有一个机器人。
49420发布于 2018-09-28
来自专栏DeepHub IMBA
多 Agent 验证架构实战：从输出评分到过程验证
模式 1：输出评分（LLM-as-Judge）最简单的验证架构。一个独立 LLM 根据结构化评分标准对求解器的输出打分，超过阈值则放行，低于阈值则令求解器重试。 langchain_anthropic import ChatAnthropic from langchain_core.prompts import ChatPromptTemplate judge_llm (task: str, response: str) -> dict: chain = JUDGE_PROMPT | judge_llm result = chain.invoke Be concise and actionable. """ critique = critic_llm.invoke(critique_prompt) score = judge_output f"Critique this answer to '{task}':\n{answer_a}\nFind specific flaws." ) synthesis = judge_llm.invoke
20710编辑于 2026-03-31
来自专栏AI工程落地
TensorRT LLM vs OpenPPL LLM
支持模型和功能对比PPL LLM只支持baichuan、chatglm、llama三个模型，Tensor-LLM支持几乎所有大模型。 TensorRT-LLM使用起来更方便模型量化TensorRT-LLM是离线量化，支持更多的量化方法，smooth quant、weight only、AWQ等PPL LLM是实时量化（i8i8），支持整个网络一起量化模型DeployTensorRT-LLM量化结束，不需要deploy中间模型，直接进入编译器。部分模型可以支持onnx可视化PPL LLM不需要deploy以及编译，直接用onnx调算子。 /docs/llama_guide.md at master · openppl-public/ppl.llm.serving (github.com)TensorRT LLM原模型-->量化-->编译两个框架都是tensor并行框架依赖Tensor-LLM需要依赖tensorrt，但主要是一些单算子（卷积、激活函数、gemm等），融合算子都是Tensor-LLM自带的。PPL LLM没有依赖
1.2K30编辑于 2023-11-21
来自专栏技术汇总专栏
腾讯开源AI Agent沙箱—Alice 写代码、Bob 找 bug、混元当裁判：我让 3 个 hy3 在两个 Cube Sandbox 里互相找茬
这个反转过程恰好揭示了一件事：LLM写代码可能比LLM算心算更靠谱。 4.Judge出场：混元怎么看这场对决？这是个挺重要的信号：LLM-as-Judge在这种"看代码读stdout推断逻辑"的任务上是可靠的，前提是你把上下文（代码+stdout）一次性塞给它，让它推理而不是检索。5.为什么这个故事值得讲？如果你的Agentpipeline里两步都要做，让代码跑、再读stdout比让LLM直接给数字稳得多。这正是CubeSandbox在Agent里的价值定位：给LLM一个『可以验算自己结论』的真实环境。 Judge一定要拿到真实stdout，不能只拿PASS/FAIL：如果只告诉Judge"R3通过率15/18"，它没办法判断"那3个失败究竟是Alice错还是Bob错"。
1.9K12编辑于 2026-05-28
给 AI Agent 装一颗"裁判芯"：Cube Sandbox × OJ 并行评判全链路
展开代码语言：PythonAI代码解释#scripts/oj_judge.py（节选）JUDGE_HARNESS=r'''importjson,time,tracemalloc,traceback,random ,timeout=8)#3.跑评测，8s兜底stdout="".join(res.logs.stdout)or""if"__JUDGE_RESULT__"instdout:record["judge"] =json.loads(stdout.split("__JUDGE_RESULT__")[-1])sb.kill()returnrecordtimeout=8是OJ评测最关键的一个参数——它把"卡死的LLM 冷启动~150ms完全够实时使用；CodeAgent自评估——LLM写代码→Cube执行→失败→把traceback喂回LLM→它自己改→再跑。）clone下来，改5行就能接你自己的LLM输出。
46520编辑于 2026-05-23

第 2 页第 3 页第 4 页第 5 页第 6 页第 7 页第 8 页第 9 页第 10 页第 11 页

点击加载更多

关于LLM-as-a-judge范式，终于有综述讲明白了

Online Judge

UVA Hangman Judge

刽子手游戏（Hangman Judge）

hduoj1073--Online Judge

Online Judge System 中术语含义

leetcode-657-Judge Route Circle

Find the Town Judge

Open Judge 2750 鸡兔同笼

Judge Route Circle

10-排序5 PAT Judge (25分)

Judge Route Circle

HDOJHDU 1073 Online Judge(字符串处理~)

九度Online Judge 题目1432：叠筐解答

使用Nova LLM评估生成式AI模型

Judge Route Circle（判断路线成圈）

多 Agent 验证架构实战：从输出评分到过程验证

TensorRT LLM vs OpenPPL LLM

腾讯开源AI Agent沙箱—Alice 写代码、Bob 找 bug、混元当裁判：我让 3 个 hy3 在两个 Cube Sandbox 里互相找茬

给 AI Agent 装一颗"裁判芯"：Cube Sandbox × OJ 并行评判全链路

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

关于LLM-as-a-judge范式，终于有综述讲明白了

Online Judge

UVA Hangman Judge

刽子手游戏（Hangman Judge）

hduoj1073--Online Judge

Online Judge System 中术语含义

leetcode-657-Judge Route Circle

Find the Town Judge

Open Judge 2750 鸡兔同笼

Judge Route Circle

10-排序5 PAT Judge (25分)

Judge Route Circle

HDOJHDU 1073 Online Judge(字符串处理~)

九度Online Judge 题目1432：叠筐 解答

使用Nova LLM评估生成式AI模型

Judge Route Circle（判断路线成圈）

多 Agent 验证架构实战：从输出评分到过程验证

TensorRT LLM vs OpenPPL LLM

腾讯开源AI Agent沙箱—Alice 写代码、Bob 找 bug、混元当裁判：我让 3 个 hy3 在两个 Cube Sandbox 里互相找茬

给 AI Agent 装一颗"裁判芯"：Cube Sandbox × OJ 并行评判全链路

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

九度Online Judge 题目1432：叠筐解答